CS180 - Project 3: [Auto]Stiching Photo Mosaics¶
Part 1¶
A.1: Shoot the Pictures¶
Below are a few of the pairs of sunset, lake photos taken by Joel Castro on a trip to eternal-pond. Sets of images with projective transformations between them share a fixed center of projection (only differing by the rotation of the camera). Each pair of images was taken varying FOVs.
![]() |
![]() |
|---|---|
| (1) Lake Left (fov=60) | (2) Lake right (fov=60) |
![]() |
![]() |
| (3) Lake left (fov=30) | (4) Lake right (fov=30) |
A.2: Recover Homographies¶
Before wrapping these images into alignment, it is necessary to recover the parameters of the transformation between each pair of images. In our case, the transformation is a homography: , where is a 3x3 matrix with 8 degrees of freedom (lower right corner is a scaling factor and can be set to 1). One way to recover the homography is via a set of pairs of corresponding points taken from the two images. To do this, we will write a function of the form:
H = computeH(im1_pts,im2_pts)
where im1_pts and im2_pts are n-by-2 matrices holding the locations of n point correspondences from the two images and is the recovered 3x3 homography matrix. In order to compute the entries in the matrix , we will set up a linear system of n equations (i.e. a matrix equation of the form where is a vector holding the 8 unknown entries of ). If , the system can be solved using a standard technique. However, with only four points, the homography recovery will be very unstable and prone to noise. Therefore more than 4 correspondences should be provided producing an overdetermined system which should be solved using least-squares.
We begin by manually determine pixel coordinate correspondence between 12 points in photograph (1) and (2), using a correspondence tool:
| # | (1) im1Points (x, y) | (2) im2Points (x, y) |
|---|---|---|
| 1 | (1547, 902) | (2375, 934) |
| 2 | (1622, 932) | (2496, 972) |
| 3 | (1643, 979) | (2531, 1033) |
| 4 | (1736, 925) | (2687, 967) |
| 5 | (1809, 842) | (2823, 850) |
| 6 | (1793, 818) | (2789, 813) |
| 7 | (1821, 787) | (2843, 771) |
| 8 | (1551, 709) | (2369, 696) |
| 9 | (1577, 673) | (2399, 651) |
| 10 | (1533, 689) | (2337, 672) |
| 11 | (1455, 736) | (2231, 739) |
| 12 | (1506, 862) | (2312, 885) |
Plotting the correspondence:

We can use the pinhole camera model to explain the projections from the world onto these two camera image planes.
![]() |
|---|
| Rotation Homography figure 41.4 from MIT visionbook |
Lets our left image be Camera 1 and our right image be Camera 2. The pinhole camera model gives that the projection from a point in our 3D wrold onto the image plane of Camera 1 is given by
where is the camera's intrisics matrix, and the world-to-camera tranformation matrix is given by since the world frame is set to the pinhole of camera 1 (therby sharing camera 1's coordinate frame), for convinience.
Similarily, camera 2 can be modeled as
where the only difference is the world-to-camera transformation is a rotation matrix representing the rotation of the camera between the two photos. The camera instrisnics are the same.
We can use the shared world-coordinates of these two mapping functions to relate the pixel positions from camera 1 to pixel positions in camera 2:
Specifically, lets rewrite that last equality:
Lets let , and ,
However, it's hard to say how much exactly we rotated by, and what if we didn't know the extrinsics of our camera? Hence we don't know what H; however, we will solve for it using what we do know. We will begin by expanding :
Substituting this into our image1-to-image2 homography equation from above:
Substituting : Expanding terms and moving all terms to one side:
In other words, if two points are related to one another by the homography matrix , they satisfy the above equalities. Lets re-write this by defining a new matrix and vector ,
However, this system is underdetermined for our 8-degrees of freedom (not 9 since an entry of is a scaling factor and can be set to 1). To account for this, lets re-define , where is some number of pairings. If N=4, the system can be solved using a standard technique. However, with only four points, the homography recovery will be very unstable and prone to noise. Therefore more than 4 correspondences should be provided producing an overdetermined system which can be solved using least-squares.
Feeding our 12 points through such a system yields the following homography matrix:
A.3: Warp the Images¶
Now we can apply our homogrpahy to the second image to transform it such that all corresponding points are properly lined up. We will compare two methods for interpolating warped images,
- Nearest Neighbor Interpolation: Round coordinates to the nearest pixel value
- and Bilinear Interpolation: Use weighted average of four neighboring pixels.
Take the following images with perspective-squewed squares/rectangles present in them:
![]() |
![]() |
|---|---|
| Forest Painting | Black Sheep |
Lets verify our warping is working correctly by performing rectification. That is, we will take a perspective-squewed square in these images, and flatten them to a square image (w/corners [[0,1080],[0,0],[1080,0],[1080,1080]]).


Looking rectified! Now that we know our warping works, lets apply this technique to align our two original lake image.
Hmmm, it appears that some extreme warping is going on.... As it turns out that Minecraft has significant "lens" distortion at high enough FOV settings. Let us try our other set of shots at the lake at a lower FOV (FOV=30).
But moreover, we'd like to overlay the two images, not simply have one override the other. We will do this in the next part.
A.4: Blend the Images into a Mosaic¶
In this section, we will warp the images so they're registered and create an image mosaic. Instead of having one picture overwrite the other, which would lead to strong edge artifacts, we will use weighted averaging. To do this, we will leave one center image unwarped and warp all other images into the center image's projection. Next, we will blend the images together to produce a single image. We will achieve this by applying weighted averaging; that is, for every pixel in the output image, we will keep a count of how many images contributed to it and an accumulation pixel values. When we're done accumulating images, we will use these two sums to take the average of the pixels that contributed to any given pixel in the output image.
Lets take a look at the results:
![]() |
|---|
| Lake Mosaic |
![]() |
![]() |
|---|
| Lamp Post Left | Lamp Post Right |
![]() |
|---|
| Lamp Mosaic |
![]() |
![]() |
![]() |
|---|
| Villager Left | Villager Root | Villager Right |
![]() |
|---|
| Villager Mosaic |
Check out the ghosting of some of the mobs that moved around in the Villager Mosiac, so cool!
This concludes part 1.
CS180 - Project 3: [Auto]Stiching Photo Mosaics¶
Part 2¶
B.1: Harris Corner Detection¶
We will start with Harris Interest Point Detector (Section 2 of "Multi-Image Matching using Multi-Scale Oriented Patches" by Brown et al.). We won’t worry about multi-scale and do just single scale. We also won't worry about sub-pixel accuracy.
Next we will implement Adaptive Non-Maximal Suppression (ANMS, Section 3).
B.2: Feature Descriptor Extraction¶
Next we will implement Feature Descriptor extraction (Section 4 of the paper). We won't worry about rotation-invariance – just extracting axis-aligned 8x8 patches.
Note: It’s extremely important to sample these patches from the larger 40x40 window to have a nice big blurred descriptor. Don’t forget to bias/gain-normalize the descriptors. We will ignore the wavelet transform section.
B.3: Feature Matching¶
Next, we will implement Feature Matching (Section 5 of the paper). That is, we will need to find pairs of features that look similar and are thus likely to be good matches. For thresholding, we will use the simpler approach due to Lowe of thresholding on the ratio between the first and the second nearest neighbors. We will consult Figure 6b in the paper for picking the threshold (but we will ignore Section 6 of the paper).
500 and 362 edge interest points were returned in feature extraction. Matching features with a harsh threshold=0.5, reduced this to 24 pair of points (i.e. 48 points accross the two images).
B.4: RANSAC for Robust Homography¶
For step 4, we will use 4-point RANSAC to compute robust homography estimates. Then, produce mosaics by adapting our code from Part A.
<matplotlib.image.AxesImage at 0x267853e7490>
Compared to the stitching of these two images made by manually selecting points in part 1, this version look signficantly more correct. Where before we saw ghosting tree, and slight misalignment in water levels, this version has all these things corrected. It is, however, a bit stretched out due, likely due to the significant camera rotation amount.
Here is the village mosiac from before run through the auto stiching pipeline. This result came out less as desired than its manual counterpart: it looks like RANSAC matched points in the field and similar grass paths more strongly than the well (which didn't have much overlap). If you didn't know what the original images looked like, this could easily fool you; edge orientation detection is very powerful!
And finally, here is mosaic of the lamp with the same base images as before. Unfortunateley it looks like it matched the wrong points. I wil investigate this further, but unfortuateley not before the deadline because I have 170 HW due today also. It's been real :D.














